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Classical multivariate principal component analysis has been ex- 
tended to functional data and termed functional principal component 
analysis (FPCA). Most existing FPCA approaches do not accommo- 
date covariate information, and it is the goal of this paper to develop 
two methods that do. In the first approach, both the mean and covari- 
ance functions depend on the covariate Z and time scale t while in the 
second approach only the mean function depends on the covariate Z. 
Both new approaches accommodate additional measurement errors 
and functional data sampled at regular time grids as well as sparse 
longitudinal data sampled at irregular time grids. The first approach 
to fully adjust both the mean and covariance functions adapts more 
to the data but is computationally more intensive than the approach 
to adjust the covariate effects on the mean function only. We develop 
general asymptotic theory for both approaches and compare their 
performance numerically through simulation studies and a data set. 

1. Introduction. Principal component analysis is a standard dimension 
reduction tool for multivariate data and has been extended to functional data 
that are in the form of random curves. Because functional data are intrin- 
sically infinite dimensional, dimension reduction is essential to analyze such 
data. In addition to Ferraty and Vieu (2006) and Wu and Zhang (2006), the 
sequence of monographs by Ramsay and Silverman (2002, 2005) provide a 
tutorial for the methodology and applications of "Functional Data Anal- 
ysis" (FDA). A sizable literature exists for FPCA, functional approaches 
to conduct principal component analysis, when entire curves are observed 
for each subject or in practical terms when subjects are measured at 
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a dense grid of time points [see, e.g., Rao (1958), Dauxois, Pousse and Romain 
(1982), Besse and Ramsay (1986), Castro, Lawton and Sylvestre (1986), 
Rice and Silverman (1991), Boente and Fraiman (2000), Bosq (2000), Car- 
dot (2000, 2006), Mas and Menneteau (2003) and Hall and Hosseini-Nasab 
(2006)]. Kneip and Utikal (2001) used methods of FDA to assess the vari- 
ability of densities for data sets from different populations. When functional 
data are observed at irregular time points, perhaps just a few time points 
per subject, they are usually referred as longitudinal data since they of- 
ten arise from longitudinal studies. Rice (2004) and Hall, Miiller and Wang 
(2006) described the intrinsic similarities and differences between FDA and 
longitudinal data analysis. Longitudinal data are often sparse with few 
measurements per subject and noisy with measurement errors (or random 
fluctuations). However, these difficulties can be overcome in most situa- 
tions, so it is still possible to conduct FPCA [Shi, Weiss and Taylor (1996), 
James, Hastie and Suger (2000), Rice and Wu (2001), Yao, Miiller and Wang 
(2005), Paul and Peng (2009) and Peng and Paul (2009)]. 

The aforementioned FPCA approaches treat all functional data as if they 
come from the same population. When the covariate information is avail- 
able, some non-FPCA approaches such as functional mixed effects models 
[Wang (1998) and Guo (2002)] and semiparametric mixed effects models 
Zhang et al. (1998) are proposed. There has been little work involving co- 
variate information in the framework of FPCA although it might be of par- 
ticular interest in many situations, for example, to study the modes of vari- 
ation of the data. Furthermore, FPCA is an effective dimension reduction 
method. Chiou, Miiller and Wang (2003) considered a general approach in- 
corporating a vector covariate effect through a semiparametric model. Their 
approach consists of two steps. In the first step, traditional FPCA was per- 
formed on all subjects ignoring the covariate information. This resulted in 
a Karhunen-Loeve expansion [see (2.2)] for each subject X{t) for which 
the conditional expectation of X{t) given the covariate Z was obtained 
and subsequently estimated through a semiparametric approach. A differ- 
ent approach was proposed in Cardot (2006), who considered conditional 
FPCA through nonparametric kernel estimators of the conditional mean 
functions and conditional variance functions. A key assumption for both ap- 
proaches is that the trajectories of the functional data are either completely 
observed or densely recorded over time. Both assumptions are rarely satis- 
fied in longitudinal medical or social studies. Specifically, the approach in 
Chiou, Miiller and Wang (2003) is not suitable for extension to sparse lon- 
gitudinal data as the conditional principal components cannot be estimated 
or approximated consistently for sparse longitudinal data. We propose a 
unified approach in Section 2 to model the mean function and two different 
approaches to model the covariance function. 
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Little is known on how to incorporate covariate information in FPCA for 
sparse longitudinal data, so our goal in this paper is to provide a unified 
platform to incorporate the covariate information that is applicable to both 
functional and longitudinal data. Two different approaches are proposed; 
one is based on conditional FPCA and the other adjusts the covariate effect 
on the mean function only. We derive uniform consistency and asymptotical 
normality for the mean and covariance functions for kernel and local poly- 
nomial smoothers. The two approaches are compared numerically through 
a simulation study and illustrated with a data example. 

The rest of this paper is organized as follows: Section 2 introduces the 
two new approaches and their estimation procedures. Asymptotic results 
and the theoretical properties of the proposed estimators are described in 
Section 3 with proofs in the Appendix. Practical implementations of the new 
approaches and simulation studies are discussed in Section 4. In Section 5, we 
employ both approaches to the Mexican Flies data in Carey et al. (2005) and 
compare them to three FPCA approaches James, Hastie and Suger (2000), 
Yao, Miiller and Wang (2005), Peng and Paul (2009) that do not incorpo- 
rate covariate information. Conclusions are in Section 6. 

2. Methodology. Ignoring the covariate information for the moment and 
consider the random functions X{t) with mean /i(t) and covariance T(t,s). 
FPCA in this simple setting corresponds to a spectral decomposition of the 
covariance F and leads to the Karhunen-Loeve decomposition of the random 
function 



where 4>k{i) is the eigenfunction of the covariance function T{s,t) corre- 
sponding to the kth largest eigenvalues, and = J^{X(t) — fj.{t)}(pk{t) dt is 
the kth. functional principal component score. In the presence of a covariate 
Z = z we view X{t, z) as a random function with mean function /i(t, z) and 
covariance function F(t, s,z) where s and t are in a compact time interval 
T- In this paper, the random function X{t,z) are not observable because 
measurements are taken on discrete time points and there may be measure- 
ment errors. This is different from the situation considered in Cardot (2006) 
where a covariate adjusted FPCA was proposed under the assumption that 
the entire function X[t,z) can be observed without errors. 

2.1. Model. We consider two ways to extend the FPCA approach to 
accommodate covariate information. Both approaches consist of two parts: 
a systematic part corresponding to the mean function and a stochastic part 
comprising the random components that reflect the covariance structure 
of the longitudinal data. In both approaches we do not assume that we 
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know the structure of z) other than that it is a smooth function, so 
we will need to estimate it nonparametrically. The difference between the 
two approaches is in the handling of the covariance structure. Conceptually, 
the covariate Z can be any vector that has continuous distribution, but due 
to the curse of dimensionality only low-dimensional Z can be used. Some 
dimension reduction approaches will be necessary for high-dimensional Z 
and are beyond the scope of this paper. 

In the first approach, it is assumed that the eigenfunctions of T{t,s,z) 
vary with z so that there exists an orthogonal expansion of T (in the 
sense) in terms of eigenfunctions (f)k{t^z) and nonincreasing eigenval- 
ues Afc(z) : r(i, s, z) = Afc(z)(^fc(t, z)(^fc(s, z). Thus the random trajectory 
X{t,z) can be represented as 

(2.2) X(i, z) = fx{t, z) + Y^ Ak{z)(l)kit, z), 

k=l 

where Ak{z) are uncorrelated random variables with mean and variance 
\kiz). Again, we will model the covariance surface nonparametrically, as- 
suming that it is a smooth function of t, s and z. Since both the mean and 
covariance functions have been adjusted by the covariate Z, we call this fully 
adjusted functional principal component analysis and abbreviate it as fF- 
PCA. This approach to adjust the covariate effects is conceptually equivalent 
to the conditional FPCA approach in Cardot (2006) but differs crucially in 
the way of estimation due to differences in the data design we consider here. 
This crucial difference in the data design also triggers a much different theo- 
retical framework. For one-dimensional Z, only one-dimensional smoothing 
is needed in Cardot (2006) to estimate both the mean and covariance func- 
tion along the Z-direction at each time location since the entire function 
X{t,z) is observed. 

When ii{t,z) = f3{t)z and the stochastic components '^k=i ^k{z)(j)k{t, z) 
in model (2.2) adopts a time- varying linear structure b{t)z for some unknown 
function /3 and random function b, model (2.2) yields the varying coefficient 
random effects model in Guo (2002). When ^{t,z) takes a partial linear 
form f{t) + f3z and the stochastic component also takes a partial linear 
form u{t) + bZ, for some unknown functions / and u, parameter /? and 
random variable b, model (2.2) reduces to the partial linear mixed model in 
Zhang et al. (1998). 

In the second approach, one can take advantage of the fact that the co- 
variate Z is a random variable and pool all subjects together after centering 
each individual curve at zero. This leads to a pooled covariance function 
T*{t,s) = J^E{{X{t,z) — fi{t,z)){X{s,z) — fi{s,z))}g{z)dz where g is the 
p.d.f. of Z on Z, and T*{t,s) is assumed to be a smooth function of t 
and s. Consequently, there exists an orthogonal expansion (in the sense) 
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in terms of eigenfunctions (j)k nonincreasing eigenvalues such that 

(2.3) X{t,z)=fiit,z) + Y,^m{t), 

k=l 

where ^4^ are uncorrelated random variable with E{Al,} = and var{A^} = 
A^. This approach has the advantage that the covariance function can be es- 
timated with a lower-dimensional smoother than its counterpart in fFPCA, 
accelerating the rate of convergence compared to fFPCA. We abbreviate this 
mean adjusted functional principal component analysis on X{t,z) — fJ.{t,z) 
as "mFPCA" where "m" stands for the mean adjusting operation. The es- 
timation procedure for mFPCA is described in Section 2.2.2. Conceptually, 
the fFPCA approach should fit the data better as it adapts to the covariate 
information in covariance estimation while mFPCA does not. This benefit 
may be offset by inferior practical performance if the data are sparse. Our 
simulation results in Section 4 reflect limited benefits of fFPCA, so one may 
prefer the mFPCA approach in many applications or try both approaches, 
unless the eigenfunctions vary substantially across the covariate values. 

2.2. Estimation. In many situations one can only observe the processes 
X{t, z) intermittently and possibility with measurement errors. Let Yij be 
the jth observation of the random function Xj, made at a random time 
Tij G T with a covariate Zi^Z and measurement error e^j where i = 1, . . . , n, 
and J = 1, . . . , iVj. Here we assume that the measurement schedule is a 
random sample of size Ni and Ni is assumed to be i.i.d. and independent 
of all other random variables. We also assume that the measurement errors 
are i.i.d. with mean and a constant variance o"^ and are independent of 
the random coefficients ^fc(z) or J^^ under model (2.2) or (2.3), respectively. 
Thus the observed data are 

(2.4) y,J=X,(^,,•,Zi) + e^J. 

The key steps in our FPCA approach are to estimate the mean and co- 
variance function. The corresponding eigenvalues and eigenfunctions can be 
obtained easily through the eigen-equation after the covariance function has 
been estimated. The mean functions for fFPCA and mFPCA are the same 
and can be estimated using any two-dimensional scatter-plot smoother of Yij 
against {Tij,Zi), for j = 1, . . . , Ni,i = 1, . . . ,n. We provide general asymp- 
totic properties of any linear scatter-plot smoother of the mean function 
IJ,{t,z) and demonstrate in Section 3 these asymptotic properties on two lin- 
ear smoothers, the Nadaraya- Watson estimator (3.1) and the local linear 
estimator (3.2). 



6 



C.-R. JIANG AND J.-L. WANG 



Similarly, our covariance estimator can also be expressed as a scatter-plot 
smoother of the so called "raw covariances" defined below against {Tij,Tik): 

(2.5) Cijk = {Yij - fi{Tij, Zi)){Yik - il{Tik, Zi)). 

The covariance estimators are different for fFPCA and mFPCA. For one- 
dimensional Z, the former involves a three-dimensional smoother of Cijk 
against {Tij,Tik,Zi) for j,k = 1, . . . , Ni, i = 1, . . . ,n while the latter only 
requires a two-dimensional smoother of Cijk against {Tij,Tik) for j,k = 
1, . . . ,Ni,i = 1, . . . ,n. In principle, one can employ any linear smoother. We 
illustrate the theorems for the Nadaraya- Watson estimators and local linear 
estimators [Fan and Gijbels (1996)] in Section 3. 

2.2.1. fFPCA. Note that since 

cov{Yij , Yik I Tij , Tik , Zi) 

= cov{X{Tij,Zi),X(Tik, Zi)) + a'^djk, 

where 6jk is 1 if j = and otherwise, the diagonal of the "raw" covariances 
Cijk in (2-5) should not be included in the covariance function smoothing 
step. With this in mind the local linear smoother for the covariance function 
r(t, s, z) is 



TL{t, s, z) = f^Q where 

( n 



^ = argmin<^ ^ K3 

{ i=l l<jj^k<Ni 



13 

(2.6) 



t Tij s Tik z Zi 



2 



hG,t ' hG,t ' he, 

x[Cijk-Wo + Pi(.Tij-t) 

+ P2{Tik - s) + f^siZ, - z))] 

and is a three-dimensional kernel function satisfying (A. 2). 

Next we aim to estimate the variance V{t,z) = r(t,t,z) + cr^ of Y{t) for 
a given z. Let K2 be a two-dimensional kernel function satisfying (A.l) and 
V{t, z) be the local linear smoother using only the diagonal time elements; 
then 

V{t,z)=^o, 

where 

N, 



^ = arg min V V ( ^7-^ , ^7— ^ ) 
a j^ij^i V hv,t hv,z ) 

X [Cw - /3o - /3i(ry -t)- P2{Zi -z)f. 
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The variance cj^ of the measurement error can be estimated by averaging 
{V{t, z) — TL{t, t, z)) over a range of t. For stabihty, one may prefer to use a 
trimmed mean restricting the averaging to take place over a central portion 
of the time domain. We find the recommendation in Yao, Miiller and Wang 
(2005) to use a trimmed mean based on the central 50% of the time domain 
satisfactory. Specifically, this leads to 

(2.7) = ^ ~ "'^ 

where 71 is the interval [inf{t : i E T} + |T|/4, sup{i : i G T} - |r|/4] with the 
notation \I\ denoting the length of a generic interval I. If the variances of 
the measurement errors vary over time and z, the variance function a'^{t,z) 
can be estimated directly as V{t,z) — T{t,t,z). 

The solutions of the eigen-equations, j Tiit, s, z)(pkis, z) ds = Xk{z)(pk{t, z), 
where the 4)k{t,z) satisfies J (^^.{t, z) dt = 1 and J 4>kit, z)4>m{t, z) dt = for 
m < k, are used to estimate the eigenfunctions and eigenvalues. It now re- 
mains to estimate the principal component scores 

Aik{Zi) = J Mt,Zi)[Xi{t,Zi)-fi{t,Z,)]dt 

for the ith subject. Due to measurement errors and the intermittent mea- 
surement schedules, the approaches in Chiou, Miiller and Wang (2003) and 
Cardot (2006) are not applicable to estimate these scores. Instead, the ap- 
proach in Yao, Miiller and Wang (2005) aimed at estimating the conditional 
expectation E{Aik{Zi)\Yi) is well suited to estimate the principal compo- 
nent scores where Yj = {Yn, . . . ,YiNi)'^ ■ Under the assumption that Yj is 
multivariate normal, this leads to the estimate 

A,iZ,) = X,^ltz^XY,-fi,), 

where fi^ = {fi{Tii, Zi), . . . , fi{TiN,, Zi))'^ , {ty.,)j^k = tL{Tij,Tik,Zi) +a'^6jk 
and 0jfc = {4>k{Tii, Zi), . . . ,<j)k{TiNi, Zi)Y . 

2.2.2. mFPCA. The estimation of T*{s,t) is similar to the procedure 
in Yao, Miiller and Wang (2005) except that we use Cijk as the raw co- 
variances. Let T*{t,s) be the covariance estimator based on a local linear 
smoother; then 

f*(t,s) = /3o where 

3 = argminJV V ^2 ( ^7-^ , I 
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x[C,,-fc-(/3o + /?i(r,,--t) 

+ /32(T,fc-s))]'|, 

where t,s £T and K2 is defined in (A.l). Let V*{t) be the local linear 
smoother focusing on the diagonal values {r*(t,t) + cr^}; then 

where 

P = argmin V V K, [d^, - f3o - Pi{T,, - t)]\ 

/3 V hv J 

where Ki is a kernel function with compact support, symmetric and Holder 
continuous. Again, a "trimmed" mean of {V*{t) — T*[t, t)) is used to estimate 
o"^ similar to (2.7). 

The solutions of the eigen-equations, fT*{t,s)(l)l.{s)ds = X^c^Kt), where 
the (i)l{t) satisfies J{^%{t)Y dt = 1 and f 4>lit)4>m{t) dt = for m < fc, are 
used to estimate the eigenfunctions and eigenvalues. The principal com- 
ponent scores A*/^ for subject i are estimated as in Yao, Miiller and Wang 
(2005) through 

where and fi^ are defined as in Section 2.2.1, and (S^ )j^k and <^^^ are de- 
fined as (S*, )j- ^ = f *(T,j,T,fc) + {a*)Hjk and 4>*ik = ikiTn), k{T^N^7 ■ 

2.3. Bandwidth selection and number of eigenfunctions. The bandwidths 
for the estimated mean function are chosen via the leave-one-curve-out cross- 
validation suggested by Rice and Silverman (1991). However, the bandwidths 
of the covariance function estimators are chosen via a fc-fold cross-validation 
procedure to save computing time. Below we define the /c-fold cross-validation 
method for the bandwidths selection of T*{t,s). The formula for r(i,s,z) is 
similar. 

Suppose that the subjects are randomly assigned to k sets (5i, 5*2, . . . , Sk)- 
k 

(2.9) /i = argmin^^ ^ {C,,m - f *(-^^)(^,,•,^,^)}^ 
^ 1=1 ieSc l<jf^m<Ni 

where t*^-^<^\Tij ^TiffYi) is the estimated covariance function at (Tij,Tim) 
when the subjects in 5^ are not used to estimate T*{t,s). We found the 
ten-fold {k = 10) method to have satisfactory performance. 
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Three criteria to choose the number of eigenfunctions K are discussed 
in the simulation study section. Suppose that the first K eigenfunctions 
are used to predict the trajectories; given t gT and z £ Z, the predicted 
trajectory of Xi{t,z) based on the first K eigenfunctions will be 

K 

(fFPCA) Xf (t, z) = fLLit, z) + Y^ Ak{z)Mt, z), 

k=l 
K 

(mFPCA) Xf (t, z) = ftLit, z) + Y^ A*,4>Ut). 

k=l 

3. Asymptotic results. For simplicity, the covariate Z in this section will 
be univariate, and Ni, . . . ,Nn are i.i.d. copies of some random variable N. 
We first focus on the asymptotic distribution of linear smoothers of the mean 
function. 



Asymptotic results for mean functions. A general theory (Lemma C.2) 
for two-dimensional kernel- weighted estimators is provided in the Appendix; 
from there the asymptotic normality of the Nadaraya-Watson kernel esti- 
mator fi^y/(t,z) and local linear estimator p,L{t,z) of fJ-(t,z) follows. 

Specifically, 

,,,, , , EILi Ef=i K2iit - Tij)/h,„ jz - Z,)/h,,z)Y,j 

EILi Ef=i i^2((t - r,,)/v,i, (^ - ^.)/V.) 

fiL{t,z) = j3Q where for /3 = (/3o,/3i,/32), 

n N, 



(3.2) 

X [Yij - /3o - Pim, -t)- P2{Zi -z)f. 



Theorem 3.1. Under assumptions A. 3, A. 5 and A. 6, B.1-B.4, and as- 
suming ~^ P/j. '^''^d nE{N)h^ f. — )• for some < p^, < oo, 



nNh^^th^,z[fj-Nw{t, z) - ii{t, z)] ^ X(/3nw, Snw), 



where 
/3nw 



E 



ki+k2=2 



1 



j s\'S2''K2{si,S2)dsidS2 
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Snw = [Var(y|t, z)\\K2f]/f2{t, z),ai{t, z) = ^(t, z)f2{t, z), 
and f2{t,z) is the joint density of {T,Z). 

Theorem 3.2. Under assumptions A. 3, A. 5 and A. 6, B.1-B.4, and as- 
suming ~^ P/j.! '^''^d nE{N)h^ ^ — )• for some < /0^,t^ < oo, 



\JnNh^^th^,z[fi'L{t, z) - fi{t, z)] ^ N{I3l,^l), 



where 



2k2+l 



If o I — 

E ^;T0 4'4K2{si,S2)dsids2 ^^fc75ifcr^(*'^)^/^V^^' 

Si = [Var(y|i,z)||i^2|P]//2(i)-z) o^c^ f2{'t,z) is the joint density of {T,Z). 

Asymptotic results for covariance functions. We will need to consider 
three-dimensional smoothers to estimate the covariance function. Again, 
the asymptotic normalities of the Nadaraya-Watson kernel estimator and 
local linear estimator of the covariance function follow from Lemma D.2 in 
Appendix D. Here the Nadaraya-Watson kernel estimator of the covariance 
T{t,s,z) is defined as 

\i=l l<j^k<Ni ^ ^'^ ^'^ ^'^ ^ / 

hi<k^<N^ V^^' hG,t '1^ ^ 

For notational convenience, we focus on the case of conventional kernel of 
order (0, 2) and denote 

af = J J J ufK^ {ui , n2 , Ms) dui du2 du^ 

for i = l,2, 3, 

nE{N{N - l))/iS,t/iG,. ^ rl n^(iV(iV - ml^th%^, ^ r| 

and 

vs{t, s, z) = Var((yi - fi{Ti,Z)){Y2 - p{T2, Z))\Ti = t,T2 = s,Z = z) 
in the following two theorems. 
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Theorem 3.3. Under assumptions A.4-A.6, B.5-B.8, and assuming 
^ PG and nE{N{N - l))h]j ^ for some < pc, tq < oo, 

^ nN{N - l)/i|,^^/iG,2{f Nw(i, s, z) - T{t, s, z)} ^ A^(7nw, ^^nw), 

where 

7NW = M^i^i ^'^{t,s,z) + a2Ti -^r(t,s,2;) +(t|t2 ^r{t,s,z) 



+ o-|r2( -^r(i,s,z) ) (^g3it,s,z) ] \ / g3{t,s,z), 



'\dz J \dz 

f^NW = [vsit, s, z)\\Ksf]/ g3{t, s, z) 
and g3{t,s,z) is the joint density of {Ti,T2, Z). 

Theorem 3.4. Under assumptions A.4-A.6, B.5-B.8, assuming — t- 
PG, and nE{N{N — '^G f^''' ^^me < pc, tq < oo, 

^nN{N-l)hl ^hGA^L{t, s, z) - T{t, s, z)} ^ iV(7L, ^^l), 

where jl = H^f ^^(t, s, z) + ajn ^T{t, s, z) + cj|t2 £^T{t, s, z)}, = 
[v3{t, s, zjWK-sW^]/ gsit, s, z) , and g3(t,s,z) is the joint density of {Ti,T2,Z). 

Remarks. 1. The above asymptotic results reveal that standard optimal 
convergent rates for independent data are attained for all estimators when 
E{N) is finite. For instance, the convergence rate for both the Nadaraya- 
Watson and local linear estimates for the mean function is n^^^ which is 
the optimal convergence rate for a two-dimensional smoother under similar 
regularity conditions, and the convergence rate for both covariance function 
estimators is n^/^, also optimal for a similar three-dimensional smoother. 

2. The convergent rates of all estimators are faster when the expected 
number of measurements per subject E{N) — t- oo as there are more data 
available per subject. For instance, the convergence rate for both mean func- 
tion estimates and both covariance function estimates can be as arbitrarily 
close to n^/^ when E{N) — )• oo. Note that n^/^ is the optimal rate of conver- 
gence when the entire longitudinal process Y(-,Zi) can be observed for all 
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subjects i = 1, . . . , n; therefore smoothing is only required on the z-direction 
leading to a one-dimensional smoothing rate. 

The asymptotic normality of the mFPCA covariance estimator can be 
handled similar to Theorem 3.2, but it is in fact much simpler if we follow 
the arguments of Theorem 2 in Yao (2007). The proof follows from the weak 
uniform convergence of fi^t, z) in Lemma D.4, the asymptotic distributions 
of the estimators based on "raw covariances," Cijk, are identical to those 
based on Cijk = {Yij - IJ-{Tij,Zi)}{Yik - fi{Tik,Zi)}. Thus the Nadaraya- 
Watson estimator and local linear estimator of covariance based on Cijk ^-re 
asymptotically equivalent to those estimators based on Cijk- To save space, 
we present only the results for the local linear smoother in (2.8). 

Theorem 3.5. Under assumptions he* — 0, nE{N'^)h'Q, — )-oo, h^, x 
E{N^) 0, nE{N{N - l))h%, for some < r < oo, A.5-A.6, and 

E.1-E.3, 

^nN{N-l)hl,{t*{t,s)-T*{t,s)]^N{^\Q*), 

where 7* = ^ J u''Ki{u)du{^T*{t,s) + ^T*{t,s)], Q* = [v2it,s)\\Ki\\^]/ 
g2{t,s),V2(,t,s)=YaT{{Yi-fi{Ti,Z)){Y2-fi{T2,Z))\Ti = t,T2 = s) andg2{t,s) 
is the joint density of (Ti,T2). 

4. Simulation results. We compare the performance of the two covariate 
adjusted FPCA approaches in Section 2 with the estimator in 
Yao, Miiller and Wang (2005) which we term uFPCA with the prefix "u" 
suggesting that it is "unadjusted" FPCA. The simulation scheme is as fol- 
lows: for each subject, a covariate z is generated from C/(0, 1), its mean 
function is fi{t,z) = t + zsm{t) -|- (1 — z)cos(i) and its variance-covariance 
function is derived from two eigenfunctions (j)i{t,z) = — cos{7r{t + z/2))y/2 
and (j)2{t, z) = sin(7r(t -|- z/2))\/2, for < i < 1 with eigenvalues \i{z) = z/Q, 
X2{z) = z/36 and = for /c > 3. The specific principal component scores 
Aik{z) are generated from N{Q,\k{z)), and the additional measurement 
errors are assumed to be normally distributed with mean and variance 
(0.05)^. For the measurement scheme {tij} we use a nonequidistant "jit- 
tered" design. Specifically, an equally spaced grid {cq, . . . , C50} on [0, 1] with 
Co = and C50 = 1 is selected and jittered according to the plan Si = Ci + €i 
where Ci are i.i.d. with A^(0, 0.0001) and then constrained to be Si = if 
Si <0 and Si = 1 if Sj > 1. Each curve is sampled at a random number of 
points, {tij}, j = 1, . . . , Ni, where iVj are chosen from a discrete uniform dis- 
tribution {2, . . . , 10}, and the locations of the measurements are randomly 
chosen from {si, . . . , ,849} without replacement. The simulation consists of 
100 runs, and the number of subject is 100 in each run. 
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Epanechniknov kernels are used in the smoothing steps. The bandwidths 
for the mean surface estimator are chosen by leave-one-curve-out cross- 
validation while the bandwidths for the covariance estimator are chosen 
by a 10-fold cross-validation method to save computing time. Three criteria 
(AIC, BIC and fraction of variation explained (FVE) method) for choos- 
ing the value K are also compared. The AIC and BIC are defined as in 
Yao, Miiller and Wang (2005). 

The FVE method is defined as the minimum number of components 
needed to explain at least a specified fraction of the total variation. In the 
simulation, we choose K for uFPCA and mFPCA to be the minimum num- 
ber k satisfying (X^(Li '^i)/(Si=i -^O ^ 0.80, and for the fFPCA approach, 
this corresponds to selecting the smallest k satisfying ^'^^i Xi{z) \ {z) > 
0.80 for each subject with covariate value z. A major difference is that this 
type of FVE would allow subject-specific choice for the number of principal 
components in fFPCA. A problem is that the covariance estimate based on 
individually selected number of principal component may not yield a smooth 
covariance surface. To rectify this and to facilitate a uniform platform to 
compare the three approaches, we propose a global choice of K based on 
the 90th percentile of the individually selected k's for fFPCA. This global 
choice is somewhat objective and may give a slight benefit to fFPCA in 
fitting the observed data as compared to using either the mean or median 
value of k as the global choice. Both AIC and BIC approaches tend to choose 
too many eigenfunctions so they can predict the data well, while FVE is the 
best for selecting the correct model. However, it is inferior to the others for 
prediction as evident in Table 2. 

The mean integrated squared error of the covariance estimator for mF- 
PCA is 0.00046, the biases and standard errors of the two eigenvalues are 
-0.0102 (s.d. = 0.0121) and -0.0035 (s.d. = 0.0052), respectively. The av- 
eraged estimated eigenfunction of the 100 simulations is close to the true 
eigenfunctions as shown in Figure 1. This suggests that the covariance es- 
timator of mFPCA is sufficiently accurate. From Table 1 and Figure 2, the 
performance of fFPCA is generally satisfactory although the accuracy varies 
with the covariate. The estimate for the second eigenfunction at Z = 0.1 is 
poor due to the small eigenvalue 0.0028, so there is probably no need to 
include more than one eigenfunction for Z = 0.1. 

Next, we compare the three different model selection criteria of choosing 
the number K of eigenfunctions. We use the mean integrated squared error 
(MISE) for the true curves Xi{t,Zi), 

n .1 

(4.1) MISE = - V / {Xi{t, Zi) - X^{t, Zi)f dt 

as a criterion where K is the number of eigenfunctions used to predict the 
trajectory of each subject. The corresponding mean squared fitting errors 
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Fig. 1. First two eigenfunctions of the covariance and their estimates by mFPCA. 
(MSFE) is 

^ n Ni 

(4.2) MSFE = -J]-5](y,,-y,,)^ 

i=l * j=l 

An outlier was detected in the 6th run for mFPCA predicted trajectory, so 
we include two results in Table 2, one with all simulations and one with this 

Table 1 

Simulation results of fFPCA. The three rows corresponding to ISE are based on the 
average integrated sguared errors of the 100 simulations where ISE of g{-) for estimating 
a target function g{-) is defined as f^{g{t) — git))^ dt. The rows corresponding to \i{z) 
are the biases and standard deviation (in bracket) 



Covariate z 


0.1 


0.3 


0.5 


0.7 


0.9 


ISE of Tl 


0.00015 


0.00025 


0.00071 


0.0014 


0.0030 


ISE of ^i{t,z) 


0.0294 


0.0076 


0.0071 


0.0074 


0.0112 


ISE of hlt.z) 


0.2720 


0.0305 


0.0242 


0.0179 


0.0300 




0.0047 


-0.0041 


-0.0113 


-0.0202 


-0.0242 




(0.0073) 


(0.0106) 


(0.0181) 


(0.0205) 


(0.0333) 


\2{Z) 


0.0034 


0.0001 


0.0005 


-0.0002 


-0.0037 




(0.0045) 


(0.0039) 


(0.0057) 


(0.0077) 


(0.0094) 
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Table 2 

Average MISE (4-1) and MSFE (4-2) in 100 simulation runs for the three approaches, 
the values in the parenthesis excludes one outlier occurred in the 6th run 







MISE 






MSFE 




FVE 


AIC 


BIC 


FVE 


AIC 


BIC 


uFPCA 


0.0339 


0.0215 


0.0215 


0.0047 


0.0035 


0.0036 




(0.0325) 


(0.0198) 


(0.0197) 


(0.0067) 


(0.0065) 


(0.0065) 


mFPCA 


0.1075 


0.0077 


0.0076 


0.0039 


0.0024 


0.0025 




(0.0103) 


(0.0063) 


(0.0063) 


(0.0050) 


(0.0017) 


(0.0017) 


fFPCA 


0.0085 


0.0077 


0.0077 


0.0039 


(0.0027) 


0.0027 




(0.0085) 


(0.0077) 


(0.0077) 


(0.0022) 


(0.0015) 


(0.0015) 



outlier run excluded. Not surprisingly, uFPCA is outperformed in general 
by both covariate adjusted approaches. When using the FVE method as a 
criterion of choosing fFPCA is slightly better than mFPCA. However, 
when using AIC or BIC as the criterion of choosing the performance of 
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Fig. 2. Means of the first two eigenfunctions estimated via fFPCA at five different values 
of the covarite. 
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mFPCA is comparable to, if not better than, that of fFPCA. Consequently, 
if the purpose is to predict subject trajectories, mFPCA with BIC is rec- 
ommended owing to its simplicity. For modeling purpose, fFPCA with the 
FVE method is preferred. 

5. Data application. We illustrate the covariate-adjusted FPCA approaches 
through reproductive data for Mexican fruit flies. The study was conducted 
at the fruit fly mass rearing facility near Metapa, Chiapas, Mexico. Daily 
egg production (number of eggs) were recorded for a total of 1151 females 
[Carey et al. (2005)] till death. The goal here is to explore the influence of 
early reproduction, as measured by total reproduction by age 30 (in days), 
to reproduction pattern up to age 50. We exclude the infertile flies and 
those living less than 50 days. The latter is to provide a uniform platform 
to perform FPCA and consider only those who live at least around the av- 
erage lifetime (w50.9 days) of fertile flies. Out of the remaining 567 flies, we 
further randomly selected 2 to 10 observations in the first 50 days, so we 
can compare the results for sparse data with the complete data to validate 
the new mFPCA and fFPCA approaches. In addition, we compare the new 
approaches with three different FPCA approaches that do not incorporate 
the covariate information. The first is the uFPCA in Yao, Miiller and Wang 
(2005), the second is the reduced rank approach in James, Hastie and Suger 
(2000), termed rFPCA with "r" stands for reduced rank, and the third is 
a geometric approach in Peng and Paul (2009) similar to the reduced rank 
method but with a different algorithm. We term this approach "gFPCA" 
with "g" standing for geometric. Both rFPCA and gFPCA assume that X{t) 
is a Gaussian processes, measurement errors are normally distributed, and 
use natural or B-spline bases to expand the eigenfunctions. Both approaches 
aim at maximizing the likelihood function, but rFPCA uses the EM algo- 
rithm to accomplish it and gFPCA tackles the likelihood functions directly 
with a Newton-Raphson method by exploiting the geometric structure of 
the eigenfunctions as they lie on a Stiefel manifold. As rFPCA serves as the 
initial estimates for gFPCA, the original code for rFPCA has been improved 
and included in an R package, fpca, which is available on the CRAN project. 

As suggested in James, Hastie and Suger (2000), the number of bases in 
rFPCA is selected by 10-fold cross-validation likelihood and the number 
of eigenfunctions are reduced by the usual FVE (fraction of variation ex- 
plained) method. For the Mexfly data, it selected 15 bases and the resulted 
numbers of eigenfunctions corresponding to 80% and 90% FVE, as reported 
in Table 3, are 9 and 11, respectively. The choice of the B-spline basis func- 
tions and the number of eigenfunctions for gFPCA are selected by a new 
cross-validated likelihood method proposed in Peng and Paul (2009) and 
they resulted in 8 bases and 5 eigenfunctions. 



COVARIATE ADJUSTED FPCA 



17 



Table 3 

MSFEs of mFPCA, fFPCA, uFPCA and rFPCA with global K and the values in bracket 

are MSFEs based on sparse data 



FVE (80%) FVE (90%) AIC BIC 

MSFE K MSFE K MSFE K MSFE K 



614.1 (465.9) 4 612.8 (447.9) 6 611.8 (433.7) 14 612.0 (436.4) 10 
614.9 (464.4) 4 613.9 (454.4) 5 612.8 (441.3) 11 613.2 (445.7) 7 
684.6 (499.8) 2 684.6 (499.8) 2 680.8 (472.6) 8 680.9 (473.6) 6 

720.2 (136.6) 9 719.1 (131.5) 11 

681.0 (477.3) for A" = 4, 680.8 (472.1) for K = 10, 680.7 (471.6) for A" = 14 

785.1 (648.6) for A" = 5 (based on the CV method), 784.8 (647.1) for is: = 6 



mFPCA 
fFPCA 
uFPCA 
rFPCA 

uFPCA 
gFPCA 



Figure 3 shows the estimated mean surface of mFPCA and fFPCA for 
both sparse and complete data; it indicates that the daily reproductive rates 
are correlated with cumulative reproduction at young age, but that mean 
estimator works well even though data are sparse. Subjects with higher cu- 
mulative reproductions at a young age tend to have higher daily reproductive 
rates. Similar to the mean estimator, the covariance estimator of mFPCA 
also works very well when the data are sparse as Figure 4 shows. The esti- 
mated covariance function based on complete data is not so smooth as that 
based on sparse data because a smaller bandwidth was selected when there 
are substantially more data. 

The mean square fitted errors for the five approaches are reported in Table 
3. The performance of uFPCA, mFPCA and fFPCA are similar to those 
from the simulation study in Section 4; mFPCA is generally slightly better 
than fFPCA for sparse data, and both outperform uFPCA and gFPCA. 
The improvements of mFPCA and fFPCA over uFPCA appear marginal for 
sparse data, but this is due to the large measurement errors (the estimates of 
a by mFPCA, fFPCA, uFPCA are 25.34, 25.44, 24.81, respectively) present 
in the data. Since uFPCA only selects two eigenfunctions, we tried to check 
whether one can improve it by increasing the number of eigenfunctions. 
We use mFPCA as the gauge, and the lower portion of Table 3 reports 
additional results for uFPCA that utilizes the same number [K = 4, 10, and 
14) of components as mFPCA. We also tried to include additional results 
for gFPCA to compare with mFPCA; however, the CV chose 8 bases and 
hence restricts K to < 8. This leads to only one additional case when 
K = Q as the algorithm encountered singularity situation for the case with 
K = 8. 

An intriguing phenomenon is the performance of rFPCA, which by far 
outperforms all other procedures for sparse data but not for the complete 
data where uFPCA, mFPCA and fFPCA all have smaller fitting errors. 
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Fig. 3. Estimated mean surface for sparse and complete data. 



mFPCA (sparse) mFPCA (complete) 




Fig. 4. Estimated covariance surfaces of mFPCA for sparse and compete data. 
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This suggests an over-fitting problem and calls for further investigation. We 
tried to investigate this with simulations but could not reach any conclusion 
using the simulation setting in Section 4. Both algorithms in rFPCA and 
gFPCA encountered singularity situations or could not converge in many 
runs with the divergent problem more serious for gFPCA. It appears that 
the smoothing parameters for both methods are sensitive to the data. 

In summary, this data supports the simpler covariate adjusted approach 
of adjusting just the mean but not the covariance. Additional benefit of the 
mFPCA approach is its computing speed. The computational time of the 
Mexican fruit flies data for fFPCA is 20 times more than mFPCA after the 
bandwidths for the mean and covariances functions have been selected. If 
we include the time to select those bandwidths, the gap is smaller as 10- 
fold CV was used to estimate the covariance functions for both mFPCA 
and fFPCA, leaving the leave-one-out CV for the mean function the most 
time-consuming part of the algorithm. However, the computational cost for 
fFPCA escalates as the total number of observations increases. 

6. Conclusions. Through simulations and data analysis, we have shown 
that current approaches for functional principal component analysis may no 
longer be suitable for functional data when covariate information is avail- 
able. Two alternatives are proposed to incorporate covariate effects on func- 
tional response data, adjusting the covariate effects on the mean function 
only (mFPCA) or adjusting the covariate effects for the covariance as well 
(fFPCA). Numerical evidence supports the simpler mean-adjusted approach 
especially when the purpose is to predict the trajectories Y{t). 

Besides the method itself, the criteria of choosing the number of eigen- 
functions affect performance. Among the three criteria discussed in the sim- 
ulation study, the FVE method based on the fraction of variation explained 
is more likely to pick the correct number of eigenfunctions than the other 
two criteria (AIC and BIC). When model fitting X{t) is the main purpose 
of the data analysis, fFPCA with the FVE criterion is the best choice. How- 
ever, fFPCA is time-consuming and mFPCA is just slightly less efficient 
than fFPCA in fitting X{t) but could be more efficient than fFPCA in pre- 
dicting Y{t), so mFPCA might be an attractive approach to accommodate 
covariates. 

Both FPCA approaches are model free and provide nonparametric esti- 
mates for both the fixed and random effects. The advantages of the principal- 
component based approach are: (1) less random effects are needed to fit the 
data; (2) it has the added value to reveal the modes of variation of the 
data and (3) it provides guidance to other parsimonious models such as a 
varying coefficient model or a linear mixed effects model. Developing formal 
inference procedures using either mFPCA or fFPCA approaches for model 
validation will be important future projects. 
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So far we have considered the theory and implementation for univariate 
covariate Z only, but both can be extended to multivariate Z conceptually 
and theoretically. The catch is the high-dimensional smoothing involved with 
a vector Z . Some dimension reduction on Z will be needed for practical 
implementation and this will be another future research project. 

APPENDIX A: KERNEL FUNCTIONS 

We consider both two and three-dimensional kernels that are symmetric 
with compact support. A kernel function : i?^ — )• i? is of order (i/, k) if 



r 0, < fci ^2 < K, 



(A.l) / / u^H^^K2{u,v)dudv= < 



I/O, ki + k2 = K, 



where u is a multi-index u = (z^i,z^2) and {vl = vi + V2. A kernel function 
K^:B? ^ R\s of order (i/, k) if 

u^^v^'^w^''^K^{u,v,w) du dv dw 
(A.2) ' ' ^ 

3 

0, < y^fct < / i^i, 

i=l 

for z = 1, 2, 3, 
ki = ui,k2 = i^2,k3 = iy3, 
. / 0, ki + k2 + k3 = K, 

where is a multi-index v = (i^i, z^2) ^^3) and = i/i -\- 1^2 + I's- 

APPENDIX B: ASSUMPTIONS 

For bandwidth sequences hi = /ii(n) and /i2 = h2{n), the notation hi x /i2 
means they are of the same order, and, namely, /11//12 stays away from 
and 00. We denote /i^^t and h^^z as the two bandwidth sequences for the 
mean function estimator in the coordinates T and Z. Similarly, hc^t and 
hcz are the two bandwidth sequences for the covariance estimator. The 
assumptions on the bandwidths are listed in A.1-A.4; the assumptions of 
the measurement schedule are in A. 5, A. 6 and A. 7 is a common assumption 
while the covariance is estimated. The bandwidth assumptions and the mea- 
surement schedule assumptions are required to show that the local property 
of the kernel-based estimators holds for longitudinal or functional data with 
the presence of within-subject correlation. 

A.l hfj^^t X hf^^z '^h, /i 0, n/il'^l+^ 00, and nh'^'^^'^ < 00. 
A.2 hct'^ha z'^h, h^ 0, nh\''\+'^ 00, and nh'^'^+^ < 00. 
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A.3 hf,^t ^ V,^ h^O, nE{N)h\''\+'^ oo, E{N)h and nE{N) x 
/i2-+2 < oo. 

A.4 h^t ^ hG,z ^h,h^O, nE{N^)h\''\+'^ oo, E{N'^)h^ and nE{N x 
{N- l))h^''+^ <oo. 

A. 5 The number of observations Ni{n) for the ith subject is a random vari- 
able with Ni[n) ~ N{n) where N{n) is a positive integer- valued random 

variable with limsup„_,oo [J^'^^n)]^ ^^'^ limsup„_,oo {^EN{nYY ^o*'^ ^'^i*^- 
Moreover, Ni{n),i = 1, . . . , n are i.i.d. 
A. 6 The observational times Tij and measurements Yij are independent of 
the number of measurements N{n). 

A. 7 E{{Y - fi{T,Z))'^} <oo. 

For random design, we assume {Tij,Zi,Yij) have the same distribution as 
(T, Z, y) with joint p.d.f. f^{t^z,y)^ and the observation times are i.i.d. 
with p.d.f. f{t), but dependency is allowed among observations from the 
same subject. The joint p.d.f.'s of {T,Z), (Ti, T2, Yi, 12)1 (Ti,T2, Z,Yi,Y2), 
iTi,T2,Ti,n,yi,Y2,Yl, Y^), {Ti,T2,Ti,T^,Z, ^1,^2, Y^, Y^), {n,T2), and (Ti, 
T2,Z) are, respectively, f2{t,z), fi{ti,t2 

t2^yi,y2,y'i,y2), f9{h,t2,t[,t'2,z,yi,y2,y[,y2), giitiM) and gz{ti,t2,z). The 
following regularity conditions for marginal or joint p.d.f.'s and mean or co- 
variance functions along with the bandwidths assumptions are used to show 
the consistency results in Section 3 and Appendices C and D. B.1-B.4 are 
for the asymptotic results of two-dimensional kernel-based estimators while 

B. 5--B.8 are for the asymptotic results of three-dimensional kernel-based 
estimators. 

The following type of continuity, as defined in Yao (2007), will be needed: 

Definition 1. A real function /(x, y) : — ^ is continuous on A C 
uniformly in y G i?™, if given any x £ A and e > there exists a neigh- 
borhood of X not depending on y, say U{x), such that \f{x',y) — f{x,y)\ < e 
for all x' € U{x) and y € i?™. 

B.l ^^fcf^^fca f2{t: z) exists and is continuous on {{t,z)} for fci + ^2 = k, < 

fci, /c2 < K, and f2{t, z) > 0. 
B-2 fs{t,z,y) is continuous on {{t,z)} uniformly in y £ R; ^^fcf^^fcj 

exists and is continuous on {(t, z)} uniformly in y G i?, for ki + k2 = k, 

Q<ki,k2<n- 

B.3 /5(ii,t2; 2/1,2/2) is continuous on {{ti,t2,z)} uniformly in (2/1,1/2) G R^- 
B.4 ^^¥f^^/^(^; -2) exists and is continuous on {(t, 2;)}, for ki + k2 = k, < 
^1, ^2 < 1^- 

B.5 ^^A^i ^gfcg (^2^3 273(^1 •5, -z) exists and is continuous on {{t, s, z)} for fci + /c2 + 
^3 = K, < ki,k2,k3 < K, and 53 (t, s, z) > 0. 
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B.6 /^{t, s, z,yi,y2) is continuous on {{t,s,z)} uniformly in (^1,2/2) G R^', 
^^ferdI^"d?^/5(i,'S,2;,?/i,?/2) exists and is continuous on {{t,s,z)} uni- 
formly in (2/1,2/2), for ki + k2 + ks = K, < fci, fc2, fcs < k. 

B.7 f9{ti,t2,t[,t2,z,yi,y2, y'l , 2/2) is continuous on{{ti,t2, t'l , t2 , z)} uniformly 
in (2/1^2/2, 2/'i, 2/2) e i?^. 

B. 8 ^^fcr^~fej^^r(t, s, z) exists and is continuous on {{t, s, z)}, for fci + /c2 + 

k-^ = K, < ki,k2,k3 < K. 

APPENDIX C: PROOFS OF THEOREMS 3.1 AND 3.2 
Given an integer Q >1 and for q = 1, . . . ,Q, let tpg : ^ R satisfy: 

C. l ipq(t, z,y)^s are continuous on U{{t,z}) uniformly in y £ R; 

C.2 The functions g^pY^^ipqit, z,y) exist for all arguments {t,z,y) and are 
continuous on U{{t,z}) uniformly in y £ R, for pi + P2 = P and < 
Pi,P2 <P- 

The kernel-weighted averages for two-dimensional smoothers are defined 

as 

(C.l) ^gn = ^T^ XT yy MT^i,Z^,Yi^K2 ( , 

where K2 is a kernel function of order (i/, k) [defined in (A.l)], h^^t, and /i^^z 
are bandwidths associated with t and z, respectively. We will see later that 
the Nadaraya- Watson estimator and local linear estimator each involves two 
and four such tpg functions yielding Q = 2 and 4, respectively. Let 

aqit.z) = Q^^^Q^^^ J Mt,z,y)f3{t,z,y)dy 

and 

aqr{t,z) = j ijg{t,z,y)iJrit,z,y)f3{t,z,y)dy\\K2f, 

where /^{t, z, y) is the joint density of (T, Z, y), ||-R'2|P = / K2 and 1 < g, r < 
Q. 

We first provide the asymptotic normality of kernel- weighted averages for 
two-dimensional smoothers based on longitudinal data. Lemma C.l extends 
Theorem 4.1 of Bhattacharya and Miiller (1993) from a univariate smoother 
on independent observations to a bivariate smoother on correlated longitu- 
dinal observations. Lemma C.2 provides the key steps for the asymptotic 
results of the Nadaray-Watson and local linear estimators. 
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Lemma C.l. Under assumptions A. 3, A. 5 and A. 6, B.1-B.4, and C.l 

and C2, 



(C.2) 



n 



V 



iV(0,S) 



Proof. We will show this through Cramer- Wold device and Lindeberg 
CLT. Let 



q=l 

where a^, 1 < q < Q, are given constants. Observing that A = Y17=i ^« where 

Q Ni 



Ui 



1 



VnENh^f,^, 



E E a, {Tij ,Zi,Yij) K2 



t — Tij z — Zi 



Y^'^^nENhll^W.T'E^ 

q=l 



qn 



and Ui^s are i.i.d. mean zero random variables. To verify the Lindeberg 
condition, we need Var([/j), 1 <i <n. First, we show 

7lENhfl-^^hl''^^+^COv{^gn,'^rn) = <Tgr{t, z) + o(l). 

To see this, consider nENh^^l^^hf^l+^co\{'i!qn,'^rn) = h - h where 



h,,th 



-E 



fJ.,t'''ll,Z 



N 



N 



X 



Y,^g{Ti,Z,Yi)K2 



and 



. 1=1 



N 



t-Ti z-Z 



EN 



X E 



1 ^ 

— Y,MTuZ,Yi)K2 



EN 



1=1 



t-T, z-Z 



^^,t ^fJ.,z 
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It is obvious that I2 = o(l). As for Ii, it can be decomposed to Ii = Qi + Q2 
where 



Qi 



1 



1 



-E 



1 / ^ 



EN 



<.j=i 



2 \ , 1 7 



E 



^l;,{T,Z,Y)MT,Z,Y)Kl 



t-T z-Z 



CT2,(t,z)+0(l) 



and 



EN 



{K,tEN) 



X 1pr(t - t2h^^t,Z- h^^zS2,y2)K2{tl, S2)K2{t2, S2)} 

E{N{N-l)) 
(EN)^ 



X |y 'il'g{t,z,yi)'ipr{t,z,y2)f5it,t,z,yi,y2)dyidy2 

X (^j K2{ti,S2)K2{t2,S2)dtidt2dS2^ +o(/l)| = o(l). 

Therefore, we can have Var(C/j) = ^{a^T^a + o(l)) where = (ai, . . . . 
Let i?n = X^iLi Var(C/j) = a^Ha + o(l). In order to apply Lindeberg CLT, 
we need to prove 

i™o ^2 E ^[f^'l{|^.|>.B„}] =0 V6 > 0, 
" i=l 

where Ij.} is an indicator function and it suffices to prove 

Jim nE[C/2l|^2>,2B2|] = 0. 

Using the fact (a + 6)^ < 2a^ + 26^, we can get 



< 2nE< 



1 



nENhf^^th^,z 



.9=1 j = l 



XK2 



t — Tij z — Z\ 
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where 



>^(a'^Sa + o(l))-o(l). 

Observing that the term [^J^^ ^f^^ a^V.^,, Zi, yi,)/^2(^, ^)]' 
is dominated by Z]?=i Ejl^i Zi, yij^V-rlTij, Zi, Yij) x 

^i C h^\^ , x^), and using change of variable, we arrive at 



where T = XlJj^^ I]r=i "g^r-V'gCi-si V*' ^"■^z V^' ^)V'r(i-si V*' ^"^a^^' 
y)K|(si,S2), 7^ = Si, and ^— ^ = S2- So far, we have shown that 

2 

berg condition holds and the proof of the lemma is thus complete. □ 



lim.„_>oo ^(o^Sa + o(l)) = oo for any given e > 0. This implies that Linde- 



Lemma C.2. Let H -.R^ R he a function with continuous first-order 
derivatives, DH{v) = {^^H{v), . . . , ^H{v)f , and N = ^J2"=iNi. Under 

assumptions A. 3, A. 5 and A. 6, B.1-B.4, C.l and C.2, and assuming ~^ 
and nE{N)h'j^-i^'^ — )• for some < p^,r^ < oo, 



^ N{Ph, [DH{ai, . . .,aQ)fnDH{ai, ag)]), 



where T, = {(yqr)i<q,r<i, o-nd 

Yl j s'l's'^'K2{si,S2)dsidS2 



ki+k2=K 

Proof. It suffices to show this theorem with N replaced by E{N) due 
to Slutsky theorem. We first handle the asymptotic bias term by showing 
that 



(C.3) JnE{N)hf^l+'hl'',i+'[H{^in, ^Qn) - H{ai,. . . , ag)] ^ ^h- 
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By conditioning on the value of A^, it is easy to see that 



qnj 



E{i;g{T,Z,Y)K2 



t-T z-Z 



Let j: — = si and I — = S2; it follows from Taylor's expansion of order 
\k\ on ^qn's and Taylor's expansion on H that 



E{^gn) = ag{t,z)+ 



ki+k2=\k\ 



52^/^2(51,52) dsi ds2 



- Q|fc|-(i^l+i^2) 
Qtki-uiQ^k2-U2 



aq{t,z) 



Combining Lemma C.l, the continuity of DH at (ai, . . . , ag)^ and the 
(5-method, we have 



(C.4) 



V 



N{0, [DH{ai, aQ)fnDH{ai,. . . , ag)]). 
The lemma now follows from (C.3) and (C.4). □ 

Proof of Theorem 3.1. Let ■ipi{ui,U2,U3) = U3, ■ip2{ui,U2,us) = I, 
and H{xi,X2) = Xi/x2, then /^nw = -?/(^in, ^2n), DH{ai,a2) = (l/a2, -ai/ai)) 
ai{t,z) = fi{t, z)f2{t, z), a2{t,z) = f2{t,z). Applying the results of Lemma 
C.2, the bias is 



/3nW= XI h Ifc I 1 Si^S2'K2{si,S2)dsidS2 



fcl+fc2=2 



a2{t,z) dt^^dz^-^ 



52 Qi(^,^) 9^ . 

«i(*' ^) - f^. u .^^2 ».k. «2(^, ^) 



(a2(t,z))2 5*^=1 92^=2 



and the components of S are 

an{t, z) = [Var(y|r = t,Z = z) + fi\t, z)]f2{t, z)\\K2f, 

ai2{t,z)=a^^{t,z)=fl{t,z)f2{t,z)\\K2f, C722{t, z) = f2{t, z)\\K2f . 



Therefore, Snw = '^X^ft ^j^'* 11-^2 |P, and the result follows. □ 
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The next lemma follows similar arguments as in Lemma 1 of 
Yao, Miiller and Wang (2005) except that a two-dimensional Fourier trans- 
formation is employed here whereas their arguments involve only a one- 
dimensional Fourier transformation. Define the Fourier transforms of K2{u, v) 
and K3{ui,U2,U3) by 

Ci{t,z) = J J exp{—{iut + iwz))K2{u,w) dudw 

and 

C2(t, s,z) = J J J exp(-(znit + iu2S + iu3z))K3{ui,U2,U3) dui du2 du^. 
They satisfy: 

D.l Ci{t,z) is absolutely integrable; 
D.2 (2{t-,s,z) is absolutely integrable. 

Lemma C.3. Under assumptions A.l, A.5-A.7, B.1-B.4, C.l and C.2 
and D.l, 

sup I "ifqn -aq\ = Op( ^f-y-^j ) where hf, x /i^,t ^< /i^t.z • 

t&T;z&Z V^/l|,' / 

Proof. Since 

e\ sup I'ifqn-agl} <e\ sup {"^ gn - E gn)\} 
^ter;zeZ J ^t(^T;zeZ J 

+ sup \E{'^qn)-ag\ 

and Taylor's expansion implies E{'^qn) = aq + 0{h^~'^^~'^'^) = aq + 0{ tut+2 ) 

It remains to show the correct order of the first term. To this end, we employ 
the inverse Fourier transformation 

C,i{t,z) = j j exp{—iut — iwz)K2{u,w)dudw, 

which implies 

t — T£j z — Zj 



K2 

(C.5) 



1 \ ^ f f f . f t - Tf A , ( z - Zo 
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Let ipqn{u,w) = i E"=i B^Ej^iexp(mr^j + iwZi)i;{Tij, Zii,Yij), and by 
plugging equation (C.5) into "^qn, we obtain 

Therefore, 

sup \^qn- E{^!qn)\ 



X \(pgn{u,w) - E{ipqn{u,w))\\Ci{hf,^tu,hf,^zw)\dudw. 



By the facts that E{\(fqn{u,w) - E{(pqn{u,w))\) < {E{(fqn{u,w) - E{(fqniu, 
«;))}2)V2 and {T^, Y^, A^J are i.i.d. where = (T^, . . ■ ,TeN,V and = 
(Yii , . . . , YiNi )'^, we have 

Vav{ipqn{u,w)) 



< - 

n 



1 f 1 ^ r 

- {e{n)) (E«^P(^2ur, +z2zz;Z)'| (s^i^'qiT,, Z,Y,) 



<.j=i / \i=i 



1 



-Eirq{T,Z,Y)), 

where the second inequahty follows from Cauchy-Schwarz inequality. The 
lemma now follows from 



e( sup l^^qn-E^qnl} 



E{\ipqn{u,w) - E{ipqn{u,w))\}\C^X {h^,,tU, h^^^w) \ du dw 



< 



1 jEi^P^g {T,Z,Y))ff\Ci{u,w)\dudw 
< .. = 



v^Kitx^r Kv^hV-'y' □ 

Proof of Theorem 3.2. Let 



COVARIATE ADJUSTED FPCA 29 



^ J 

where Wi-\ = , , — ( ^ ^\ t— It can be shown that 

-Roo — PiSio — /325'io 







'S'oo 
where 

-i?oo('S'io5'o2 — SiiS'oi) + -Rio(5'oo5'o2 — 5'oi) ~ -Roi('S'oo5'ii — ^oiSio) 



/5i 

/32 



'S'oo(5'iO'S'o2 — 'S'li) — 'S'io('S'io5'o2 — •S'uS'oi) + Soi{SioSu — S'oi5'2o) 
-Roo(5'iO'S'ii — 'S'025'20) — -Rio('S'oo5'ii — SiqSqi) + -Roi('S'oo'S'2o — 5*10) 



5'oo(5'io5'o2 — "S^i) — »S'io(5'io5'o2 — SuS'oi) + 5'oi(5io-S'ii — 501520) 

Applying Lemma C.2 and Slutsky's theorem repeatedly, we can show through 
tedious calculations and Theorem 3.1 that \Bi — 0i \ = 0„( , ]^^^ , ) and 

\/'n-E{N)hj,' 

\h- h\= Op^ ^nE{N)hi ^^^^^ V ^ ^A*,* ^ V,^- These results imply that 



lim ^nNhliiflLit, z) - fx{t, z)) - {fl{t, z) - fx{t, z))] ^ 0, 

where fl{t, z) = [i?oo — /5i5io — /325io]/5oo- It suffices to show Theorem 3.2 for 
fl{t,z) instead of jlL{t,z), and this follows from setting //(xi, X2, X3, 0:4) = 

[X2- (3lX3 + PitXi- 132X4^ + (32ZXi]/xi, Vl (-"l , '"2, '"s) = 1, ^2 (^^l , ^^2 , ^s) = ""3 , 

ip3{ui,U2, U3) = ui, and V'4('"i) ^^2, ^^3) = U2 in Lemma C.2. □ 

APPENDIX D: PROOFS OF THEOREMS 3.3 AND 3.4 
For an integer Q>1, let : i?^ — )• i? for q = 1, . . . ,Q satisfy: 
C.3 '&q{t, s, z,yi, 1/2)^8 are continuous on U{{t, s, z}) uniformly in (yi,y2) G 

C.4 The functions gjpTg^g^^?q(t, 2;, yi, 2/2) exist for all arguments {t,s,z, 
2/1,2/2) and are continuous on U{{t, s, z}) uniformly in (1/1,2/2) £ R^, for 
Pi+P2+P3=P and 0<pi,p2,P3 <P- 

The general weighted averages of three-dimensional smoothing methods are 
defined as 

" n£;(iV(iV-l))/i^^+'^^+2^g,+i 

n 

(D.l) MTij,Tik,Zi,Yij,Y,k) 

1=1 l<jf^k<Ni 



t Tij s Ti}^ z Zi 



hct hct hG,z 
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where is a kernel function of order {v^k) [see (A. 2)]. Let 



QfUi QgU2 Qz^^S 

X / ^q{t,s,z,yi,y2)f5it,s,z,yi,y2)dyidy2, 



UJqr = J '&q{t,S,Z,yi,y2)T9r{t,S,Z,yi,y2) 

X f5{t,s,z,yi,y2)dyidy2\\K3\\^, 

where hit, s, z,yi,y2) is the joint density of {Ti,T2, Z,Yi,Y2), Ilia's |P = / 
and 1 < q,r < I. 

Next, we provide the longitudinal version of the asymptotic normality 
property of kernel-weighted averages for three-dimensional smoothers. 

Lemma D.l. Under Assumption A.4-A.6, B.5-B.8, C.3 and CA, 



+1 

G,z 



(D.2) X {(ei„, . . . , eQnf - {Ei@in), E{eQn)y} 

Proof. The proof follows similar framework as in Lemma C.l with 
appropriate modifications for three-dimensional smoothers. □ 

Lemma D.2. Let H :R^ R he a function with continuous first order 
derivatives, DH{v) = {^^H{v), ^H{v)Y , and N = ^ J2i=i ^i- Under 

Assumption A.4~A.6, B.5-B.8, C.3 and CA, ^ ^ pc and nE{N{N - 

l))h'^Q'^^ — > Tq for some < pc-, tq < oo, we obtain 



nN{N - l)h'aT hZ {^{Om, Qqu) - H{^i, ^q)} 
^ N{jH, [DHi^u . . .,^Q)fn[DH{^,, . . . ,^q)]), 
where il = {oJqT)i<q,r<Q and 

^ ( (—1)'^ f ^ 
7H = E E I — I u'l'u2^ul^K-i{ui,U2,U3)duidu2du:i\ 
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X f5{t,s,z,yi,y2)dyi dy2 



Proof. The framework of this proof is similar to that of Lemma D.l. 

□ 

Lemma D.3. Under assumptions A. 2, A.5-A.7, B.5-B.7, C.3 and C.4 
and D.2, 

sup \Qqn-iq\=Op{ X^HJU where he ^ h^t >^ ha,z- 

t,seT;zezo \^/nh}Q^^ J 

Proof. The proof is very similar to the proof of Lemma C.3. □ 

Lemma D.4. Under assumptions A.l and A. 2, A.5~A.7, B.1-B.3, B.5- 
B.7 and D.l and D.2 for K2, we have defined as (A.l) of order (0,2), and 
K3 defined as (A. 2) of order (0,2), 

sup \fiL(.t,z) - fl{t,z)\=Op( ^, ^ , — 

1 



sup \TL{t,s,z)-T{t,s,z)\ = Op 



^/nh'^^hc,. 



Proof. Apply Lemma C.3 to the Nadaraya-Watson estimator /iNw(*5 
by choosing il}i{t,z,y) = y, ip(^t,s,y) = 1, and H{xi,X2) = one can obtain 



sup \f{t,z)-fit,z)\ = Op 



( 1 



sup \fij^w{t,z)-fi{t,z)\=Op(—l- 
teT;zez V Vrihf, ) 

Similar to the proof of Theorem 3.2, one can rewrite flL{t,z) as 



flL{t,z)= flf^w(t,z) - glO- y/^^ 5*10, 

f{t,z) f[t,z) 
where is defined in the proof of Theorem 3.2 and show that 

sup /3i\=Op( —Ir-r] and sup |^2-/32|=Op, 

teT;zez \Vnh-f,J teT;zez W^h 

Thus, supt(,'j-,zez\fj'L{t, z) - fi{t, z)\ = Op{^^). 



i3 



32 



C.-R. JIANG AND J.-L. WANG 



The uniform convergence rate of the covariance estimator simply replaces 
Lemma C.3 with Lemma D.3. □ 

Proofs of Theorems 3.3 and 3.4. From Lemma D.4, we know that 
supt J fi{t,z) - fi{t,z)\ = Op{-^j^) for both fi^^{t,z) and fiLit.z). Let 

'&i{ti,t2,z,yi,y2) = {yi - ^(ti,z))(y2 - Kt2,z)), i?2(ii,i2, 2,^1,^2) = {vi - 
fi{ti,z)), and '&3{ti,t2,z,yi,y2) = I, then sup^ jS^nl = Op(l), g = 1,2,3, 

by Lemma D.3. Thus, we can obtain sup^ ^ |Qgn|Op( ) = Op( ) for 

g = 2, 3. By the fact that 

Cijk = Cijk + (Yij — ^{Tij, Zi))[fi[Tik, Zi) — jl{Tik, Zi)) 
+ [Yik - KTik, Zi)){fj.{Tij,Zi) - fi{Tij,Zi)) 
+ {fi{Tij,Zi) - fi{Tij,Zi)){ii{Tik,Zi) - fi{Tik,Zi)), 

and sup^^^\fl{t, z) — fJ,{t, z)p = is negligible compared to TnwI*) 

s,z) and TL{t,s,z) obtained via smoothing Cijk are asymptotically equiva- 
lent to those, denoted by ^^^Jf^{t, s, z) and TL{t,s,z), respectively, obtained 
via smoothing Cijk- Therefore, it suffices to show the asymptotic distribu- 
tions of Tj<fy/{t,s,z) and rL{t,s,z). 

Theorem 3.3 now follows from Lemma D.2 by letting 'di{t,s,z,yi,y2) = 
{yi - fi{t,z)){y2 - fi{s,z)), ^2{t,s,z,yi,y2) = 1, and H{xi,X2) = f^. 

Theorem 3.4 follows from similar arguments as in the proof of Theorem 
3.2. □ 

Proof of Theorem 3.5. To show the asymptotic results of the mF- 
PCA covariance estimator, we need the following regularity conditions for 
the pooled covariance function and some joint p.d.f.'s: 

E.l ^jffcf ^gfc2 ^2(^1 s) exists and is continuous on {{t,s)} for ki + k2 = k, < 

ki,k2 < K, and 52(^1 s) > 0; 
E.2 /4(t, s, yi, 2/2) is continuous on {(t, s)} uniformly in (2/1,2/2) £ R^', 

^^fe^^^/4(t, s, 2/1, 2/2) exists and is continuous on {(t, s)} uniformly in 

(2/152/2) G R^, for ki + k2 = K, < /ci, ^2 < 1^] 
E.3 /8(ii,*2,ii,i2, 2/1, 2/2, 2/1,2/2) is continuous on {(ti, t2, i'l, ^2)} uniformly in 

(2/1,2/2,2/^,2/2) e R^. 

Since the covariance estimator of mFPCA involves two-dimensional smooth- 
ing, the theoretical properties of covariance estimator in Theorem 1 in 
Yao, Miiller and Wang (2005) and Theorem 2 in Yao (2007) can be ap- 
plied directly. Thus, under assumptions A.5-A.7, D.l, E.1-E.3, he* — )• 0, 
n/igr* 00 and n/ig* < 00, one could obtain that 

sup If* (t, s) - r*(t, s)\ = oJ-}-^\ . 
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